CBT Campus' Online Skills Training Courses.

IT Skills

Enterprise Database Systems

Data Science

Data Wrangling with Python/Pandas

it_dsdwppdj_03_enus

it_dsdwppdj_02_enus

it_dsdwppdj_01_enus

it_fedads_02_enus

Data Wrangling with Pandas: Advanced Features

Course Number:
it_dsdwppdj_03_enus

Expected Duration (hours)
1.2

Lesson Objectives

Data Wrangling with Pandas: Advanced Features

Course Overview
perform grouping and aggregations on data
work with multiple, hierarchical indexes
specify grouping and aggregations with multiple indexes
perform general user-defined aggregations
extract subsets of data using filtering
identify kinds of masking operations
troubleshoot data with duplicates
identify how categorical data differs from continuous
perform filtering operations on categorical data
recognize default and custom indexes and reindex DataFrames
perform filtering operations, drop duplicate data, and work with categories

Overview/Description

This course uses Python, the preferred programming language for data science, to explore Pandas, a popular Python library, and is a part of the open-source PyData stack. In this 11-video Skillsoft Aspire course, learners will use Pandas DataFrame to perform advanced category grouping, aggregations, and filtering operations. You will see how to use Pandas to retrieve a subset of your data by performing filtering operations both on rows, as well as columns. You will perform analysis on multilevel data by using the GROUPBY operation on Dataframe. You will then learn to use data masking or data obfuscation to protect classified or commercially sensitive data. Learners will work with duplicate data, an important part of data cleaning. You will examine the two broad categories of data continuous data which comprise of a continuous range of value, and categorical data has discrete, finite values. Pandas automatically generates indexes for each of our DataFrame rows, and here you will learn to different types of reindexing operations on Dataframe.

Target

Prerequisites: none

Data Wrangling with Pandas: Visualizations and Time-Series Data

Course Number:
it_dsdwppdj_02_enus

Expected Duration (hours)
1.5

Lesson Objectives

Data Wrangling with Pandas: Visualizations and Time-Series Data

Course Overview
load and explore the dataset used for visualization
plot pie charts, box plots, and scatter plots using Pandas
identify and work with time-series data
calculate deltas and percentage returns in stock prices
define time deltas and date ranges in Pandas
clean missing data in mismatched DataFrames
identify string data stored in DataFrames
perform advanced manipulations on string data
change column values by applying functions
transform data with user-defined functions
transform all columns in a DataFrame
recall how to plot visuals and transform column values

Overview/Description

This 12-video Skillsoft Aspire course uses Python, the preferred programming language for data science, to explore data in Pandas with popular chart types such as the bar graph, histogram, pie chart, and box plot. Discover how to work with time series and string data in data sets. Pandas represents data in a tabular format which makes it easy to perform data manipulation, cleaning, and data exploration, all important parts of any data engineer's toolkit. You will learn how to use Matplotlib, a multiplatform data visualization library built on NumPy, the Python library that is used to work with multidimensional data. Learners will use Panda's features to work with specific kinds of data such as time series data and stream data. This course uses a real-world demonstration using Pandas to analyze stock market returns for Amazon. Finally, you will learn how to make data transformations to clean, format, and transform the data into a useful form for further analysis.

Target

Prerequisites: none

Data Wrangling with Pandas: Working with Series & DataFrames

Course Number:
it_dsdwppdj_01_enus

Expected Duration (hours)
1.2

Lesson Objectives

Data Wrangling with Pandas: Working with Series & DataFrames

Course Overview
install and work with Pandas
create and configure Pandas Series objects
perform data wrangling operations on Series objects
use appending and sorting operations on Series objects
create and configure Pandas DataFrame objects
perform indexing operations on DataFrames
identify and troubleshoot missing data
work with aggregations on columns
perform statistical operations on DataFrames
recall basic concepts and instantiate Series and DataFrame objects

Overview/Description

Pandas, a popular Python library, is part of the open-source PyData stack. In this 10-video Skillsoft Aspire course, you will learn that Pandas represents data in a tabular format which makes it easy and intuitive to perform data manipulation, cleaning, and exploration. You will use Python's DataFrame a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). To take this course, you should already be familiar with Python programming language; all code writing is in Jupyter notebooks. You will work with basic Pandas data structures, Pandas Series objects representing a single column of data which can store numerical values, strings, Booleans, and more complex data types. Learn how to use Pandas DataFrame, which represents data in table form. Finally, learn to append and sort series values, add missing data, add columns, and aggregate data in a DataFrame. The closing exercise involves instantiating a Pandas Series object by using both a list and a dictionary; changing the Series index to something other than default value; and practicing sorting Series values in place.

Target

Prerequisites: none

Final Exam: Data Wrangler

Course Number:
it_fedads_02_enus

Expected Duration (hours)
0.0

Lesson Objectives

Final Exam: Data Wrangler

apply a group by transformation to aggregate with a conditional value
apply grouping and aggregation operations on a DataFrame to analyze categories of data in a dataset
build and run the application and confirm the output using HDFS from both the command line and the web application
change column values by applying functions
change date formats to the ISO 8601 standard
code up a Combiner for the MapReduce application and configure the Driver to use it for a partial reduction on the Mapper nodes of the cluster
compare managed and external tables in Hive and how they relate to the underlying data
configure and test PyMongo in a Python program
configure the Reducer and the Driver for the inverted index application
create and analyze categories of data in a dataset using Windows
Create and configure Pandas dataFrame objects
Create and configure pandas series object
create and instantiate a directed acyclic graph in Airflow
create a Spark DataFrame from the contents of a CSV file and apply some simple transformations on the DataFrame
create the driver program for the MapReduce application
define and run a join query involving two related tables
define a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue
define the Mapper for a MapReduce application to build an inverted index from a set of text files
define what a window is in the context of Spark DataFrames and when they can be used
demonstrate how to ingest data using Sqoop
describe data ingestion approaches and compare Avro and Parquet file format benefits
describe the beneficial features that we can achieve using serverless and lambda architectures
describe the data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes
describe the different primitive and complex data types available in Hive
extract subsets of data using filtering
flatten multi-dimensional data structures by chaining lateral views
handle common errors encountered when reading CSV data
identify and troubleshoot missing data
identify and work with time-series data
identify kinds of masking operations
implement a multi-stage aggregation pipeline
implement data lakes using AWS
implement deep learning using Keras
install MongoDB and implement data partitioning using MongoDB
list the prominent distributed data models along with their associative implementation benefits
list the various frameworks that can be used to process data from data lakes
load a few rows of data into a table and query it with simple select statements
load multiple sheets from an Excel document
perform create, read, update, and delete operations on a MongoDB document
perform statistical operations on DataFrames
plot pie charts, box plots, and scatter plots using Pandas
recall the prominent data pattern implementation in microservices
recognize the capabilities of Microsoft machine learning tools
recognize the machine learning tools provided by AWS for data analysis
recognize the read and write optimizations in MongoDB
setup and install Apache Airflow
split columns based on a pattern
test Airflow tasks using the airflow command line utility
trim and clean a DataFrame before a view is created as a precursor to running SQL queries on it
use a regular expression to extract data into a new column
use a Spark accumulator as a counter
use createIndex to build an index on a collection
use Maven to create a new project for a MapReduce application and plan out the Map and Reduce phases by examining the auto prices dataset
use the alter table statement to change the definition of a Hive table
use the find operation to select documents from a collection
use the mongoexport tool to export data from MongoDB to JSON and CSV
use the mongoimport tool to import from JSON and CSV
use the UNION and UNION ALL operations on table data and distinguish between the two
work with data in the form of key-value pairs - map data structures in Hive
work with scikit-learn to implement machine learning

Overview/Description

Final Exam: Data Wrangler will test your knowledge and application of the topics presented throughout the Data Wrangler track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.

Target

Prerequisites: none